Rewriting Saves Extracted Summaries
نویسندگان
چکیده
In automated iting, text summarization plays an important role when creating the digests of contents; the digest helps users find the desired information quickly. In our automated iting system that makes FAQ-like information packages from the USENET articles, ummaries are extracted from original rticles. Extracted summaries are useful for selecting the desired information, however, they can be improved into more appropriate summaries tothe package. In this paper, we discuss the limitations f the current summary-extraction method, and propose mployment ofrewriting as a solution. Introduction Automated editing (Sato 1997; Sato & Sato 1997) the new and ambitious idea that the automated system edits information packages from electronic materials. Our first application aimed to edit Frequently Asked Questions (FAQ) from the USENET articles automatically; at this stage, we made the system that generates Questions and Answers Package (QA-Pack)--a substitute of FAQ--that is updated everyday. The basic structure of QA-Pack is the classification tree (concept tree), in which a node represents a classification concept; that is, QA-Pack consists of hierarchically classified threads1. QA-Pack provides two major navigation mechanisms: the table-of-contents and a digest page per node (concept). Figure I shows the table of contents of Sun QA-Pack in English2. The table-of-content page shows the first XA thread is a group of articles that share the same subject line (Subject:) in their headers. In Q&A newsgroup, such as comp.sys.sun.admin and fj.sys.sun, the first article of a thread is a question article about Sun workstations and the rest are response articles. 2There is also a Japanese Sun QA-Pack. The English QA-Pack is made from articles of comp.sys.sun.admin and the Japanese QA-Pack is made from articles of fj.sys.sun. Both QA-Packs are open at http://www-sato.jalst.ac.jp:8000/faq/. two levels of the concept tree, and provides users with its overview. Figure 2 shows a digest for a concept-serial port. This type of pages is called node page because it corresponds to a node of the concept tree. In this page, each thread is represented by a group of a headline and a summary. A headline denotes the thread’s topic and a summary complements the headline; a headline has a hyperlink to the related thread page (Figure 3). To generate QA-Pack from the USENET articles, multitudinous editing works are needed. Among them, there are two central tasks (Figure 4). 1. Determine what concept node a thread should belong to. 2. Generate a headline and a summary from each thread. Obviously, the first task is an issue of text classification; the second task is an issue of text summarization. In this paper, we discuss text summarization in the automated editing system of QA-Pack. First we overview the summary-extraction method of the current system. Then, we discuss the limitations of the method, and propose a solution. Summary Extraction in QA-Pack Editing The word summarize is too general to implement a summarization module; before implementation, we have to clarify what kind of summary is requested in the specific application. Usually, (1) what we use summaries for, and (2) how we use summaries, are two basic questions that we must find the answer at this stage. For our application, we first designed the style of the node page (Figure 2), where summaries are used. The purpose of the node page is to provide an overview of the questions or the problems related to certain specific concept; the node page helps readers find the similar 76 From: AAAI Technical Report SS-98-06. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved.
منابع مشابه
Sentence Reduction for Automatic Text Summarization
We present a novel sentence reduction system for automatically removing extraneous phrases from sentences that are extracted from a document for summarization purpose. The system uses multiple sources of knowledge to decide which phrases in an extracted sentence can be removed, including syntactic knowledge, context information, and statistics computed from a corpus which consists of examples w...
متن کاملSentence Reduction for Automatic Text Summarization Motivation
We present a novel sentence reduction system for automatically removing extraneous phrases from sentences that are extracted from a document for summarization purpose. The system uses multiple sources of knowledge to decide which phrases in an extracted sentence can be removed, including syntactic knowledge, context information, and statistics computed from a corpus which consists of examples w...
متن کاملRewriting Queries over Summaries of Big Data Graphs
This short paper reports on the benefits that traversal queries over existing graph stores (such as RDF databases) can gain from a class of optimizations based on summaries. Summaries, also known as structural indexes, have been extensively covered in the literature (see [2] for a brief overview). Despite this, summary-based optimizations are not widely implemented. To make both graph traversal...
متن کاملImproving the Coherence of Multi-document Summaries: a Corpus Study for Modeling the Syntactic Realization of Entities
References included in multi-document summaries are often problematic. In this paper, we present a corpus study performed to derive statistical models for the syntactic realization of referential expressions. Our work shows how the syntactic realization of entities can influence the coherence of the text and provides a model for rewriting references in multidocument summaries to smooth disfluen...
متن کاملTerm Graph Narrowing 3
We introduce term graph narrowing as an approach for solving equations by transformations on term graphs. Term graph narrowing combines term graph rewriting with rst-order term uniication. Our main result is that this mechanism is complete for all term rewriting systems over which term graph rewriting is normalizing and connuent. This includes, in particular, all convergent term rewriting syste...
متن کامل